Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines
نویسندگان
چکیده
This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitly storing the natural gradient metric L. This metric is shown to be the expected second derivative of the log-partition function (under the model distribution), or equivalently, the covariance of the vector of partial derivatives of the energy function. We evaluate our method on the task of joint-training a 3-layer Deep Boltzmann Machine and show that MFNG does indeed have faster per-epoch convergence compared to Stochastic Maximum Likelihood with centering, though wall-clock performance is currently not competitive.
منابع مشابه
Wasserstein Training of Restricted Boltzmann Machines
Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is g...
متن کاملWasserstein Training of Boltzmann Machines
The Boltzmann machine provides a useful framework to learn highly complex, multimodal and multiscale data distributions that occur in the real world. The default method to learn its parameters consists of minimizing the Kullback-Leibler (KL) divergence from training samples to the Boltzmann model. We propose in this work a novel approach for Boltzmann training which assumes that a meaningful me...
متن کاملHow to Center Binary Deep Boltzmann Machines
This work analyzes centered binary Restricted Boltzmann Machines (RBMs) and binary Deep Boltzmann Machines (DBMs), where centering is done by subtracting offset values from visible and hidden variables. We show analytically that (i) centering results in a different but equivalent parameterization for artificial neural networks in general, (ii) the expected performance of centered binary RBMs/DB...
متن کاملMaterial for : Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style ∗ Graham
In this document, we provide additional details for variants of Conditional Restricted Boltzmann Machines (CRBMs). Specifically we focus on each of the four models compared in the Quantitative Evaluation (Sec. 4.4). We collect the formulae required for contrastive divergence learning of parameters, synthesis from a trained model by alternating Gibbs samping, and forward prediction from a traine...
متن کاملAverage Contrastive Divergence for Training Restricted Boltzmann Machines
This paper studies contrastive divergence (CD) learning algorithm and proposes a new algorithm for training restricted Boltzmann machines (RBMs). We derive that CD is a biased estimator of the log-likelihood gradient method and make an analysis of the bias. Meanwhile, we propose a new learning algorithm called average contrastive divergence (ACD) for training RBMs. It is an improved CD algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1301.3545 شماره
صفحات -
تاریخ انتشار 2013